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Abstract 

We consider the well-studied problem of finding a perfect matching in d-regular bipartite graphs with 
2n vertices and m = nd edges. While the best-known algorithm for general bipartite graphs (due to 
Hopcroft and Karp) takes 0{m^/n) time, in regular bipartite graphs, a perfect matching is known to 
be computable in 0(m) time. Very recently, the 0{m) bound was improved to 0(min{m, - — j^}) 
expected time, an expression that is bounded by 0(n^^^). In this paper, we further improve this result by 
giving an 0(min{m, " " }) expected time algorithm for finding a perfect matching in regular bipartite 
graphs; as afunction of n alone, the algorithm takes expected time 0((nlnn)^'^). 

To obtain this result, we design and analyze a two-stage sampling scheme that reduces the problem 
of finding a perfect matching in a regular bipartite graph to the same problem on a subsampled bipartite 
graph with 0{n In n) edges. The first-stage is a sub-Unear time uniform sampling that reduces the size of 
the input graph while maintaining certain structural properties of the original graph. The second-stage is 
a non-uniform sampling that takes linear-time (on the reduced graph) and outputs a graph with 0{n In n) 
edges, while preserving a matching with high probability. This matching is then recovered using the 
Hopcroft-Karp algorithm. While the standard analysis of Hopcroft-Karp also gives us an 0{n^'^) running 
time, we present a tighter analysis for our special case that results in the stronger 0(min{TO, ^}) time 
mentioned earlier. 

Our proof of correctness of this sampling scheme uses a new correspondence theorem between cuts 
and Hall's theorem "witnesses" for a perfect matching in a bipartite graph that we prove. We believe this 
theorem may be of independent interest; as another example application, we show that a perfect matching 
in the support of an n x n doubly stochastic matrix with m non-zero entries can be foimd in expected time 
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1 Introduction 



A bipartite graph G = (P, Q, E) with vertex set P U Q and edge set C P x Q is said to be regular if every 
vertex has the same degree d. We use m = nd to denote the number of edges in G and n to represent the 
number of vertices in P (as a consequence of regularity, P and Q have the same size). Regular bipartite graphs 
are a fundamental combinatorial object, and arise, among other things, in expander constructions, scheduling, 
routing in switch fabrics, and task-assignment |[T5l [Tir61. 

A regular bipartite graph of degree d can be decomposed into exactly d perfect matchings, a fact that is an 
easy consequence of Hall's theorem [4], and is closely related to the Birkhoff-von Neumann decomposition of 
a doubly stochastic matrix [ 3, 17 1. Finding a matching in a regular bipartite graph is a well-studied problem, 
starting with the algorithm of Konig in 1 9 1 6 1 13] , which is now known to run in time 0{mn) . The well-known 
bipartite matching algorithm of Hopcroft and Karp ||9| can be used to obtain a running time of 0{'my/n). In 
graphs where d is a power of 2, the following elegant idea, due to Gabow and Kariv Q, leads to an algorithm 
with 0(m) running time. First, compute an Euler tour of the graph (in time 0{m)) and then follow this 
tour in an arbitrary direction. Exactly half the edges will go from left to right; these form a regular bipartite 
graph of degree d/2. The total running time T{m) thus follows the recurrence T{m) = 0(m) + T{m/2) 
which yields T{m) = 0{m). Extending this idea to the general case proved quite hard, and after a series of 
improvements (eg. by Cole and Hopcroft (5], and then by Schrijver (16] to 0{md)), Cole, Ost, and Schirra |]6l 
gave an 0{m) algorithm for the case of general d. Their main interest was in edge coloring of general bipartite 
graphs, where finding perfect matchings in regular bipartite graphs is an important subroutine. Very recently, 
Goel, Kapralov, and Khanna ISj, gave a sampling-based algorithm that computes a perfect matching in d- 
regular bipartite graphs in 0(min{m, " ^j^"" }) expected time, an expression that is bounded by 0(n^-^^). 
The algorithm of [ 8 1 uses uniform sampling to reduce the number of edges in the input graph while preserving 
a perfect matching, and then runs the Hopcroft-Karp algorithm on the sampled graph. 

Our Results and Techniques: We present a significantly faster algorithm for finding perfect matchings in 
regular bipartite graphs. 

Theorem 1.1 There is an O ^min{m, "^^^^" j^ expected time algorithm to find a perfect matching in a d- 
regular bipartite graph G. 

As a function of n alone, the running time stated above is 0((n In n)^-^). Since the 0(m) running time 
is guaranteed by the algorithm of Cole, Ost, and Schirra, we are only concerned with the case where d is 
Q{y/n\nn). For this regime, our algorithm reduces the perfect matching problem on a regular bipartite 
graph G to the same problem on a (not necessarily regular) sparse bipartite graph H with O(nlnn) edges. 

2 1 3 

This reduction takes time 0( " ^ We then use the Hopcroft-Karp algorithm on H to recover a perfect 
matching. A black-box use of the analysis of the Hopcroft-Karp algorithm would suggest a running time 
of Q( "^ " + n^'^lnn). However, we show that the final sampled graph has some special structure that 

2 1 2 

guarantees that the Hopcroft-Karp algorithm would complete in time 0( " ^ " ) whp. 

For every pair A (1 P,B (1 Q, vje. define a witness set W{A, B) to be the set of all edges going from 
Ato Q \ B. Of particular interest are what we call Hall witness sets, which correspond to \A\ > \B\; the 
well-known Hall's theorem [4J says that a bipartite graph H{P, Q, Eh) contains a perfect matching iff Eh 
includes an edge from each Hall witness set. Thus any approach that reduces the size of the input bipartite 
graph by sampling must ensure that some edge from every Hall witness set is included in the sampled graph; 
otherwise the sampled graph no longer contains a perfect matching. Goel, Kapralov, and Khanna lEl showed 

2 

that no uniform sampling scheme on a d-regular bipartite graph can reduce the number of edges to o( ) 
while preserving a perfect matching, and hence their 0(n^'^^)-time algorithm is the best possible running 
time achievable via uniform sampling followed by a black-box invocation of the Hopcroft-Karp analysis. 
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In order to get past this barrier, we use here a two-stage sampling process. The first stage is a uniform 
sampling (along the lines of fS]) which generates a reduced-size graph G' = (P, Q, E') that preserves not 
only a perfect matching but also a key relationship between the sizes of "relevant" witness sets and cuts in the 
graph G. The second stage is to run the non-uniform Benczur-Karger sampling scheme [2J on G' to generate 
a graph G" with 0{n) edges while preserving a perfect matching w.h.p. Since this step requires 0(|£"|) time, 
we crucially rely on the fact that G' does not contain too many edges. 

While our algorithm is easy to state and understand, the proof of correctness is quite involved. The 
Benczur-Karger sampling was developed to generate, for any graph, a weighted subgraph with 0{n) edges 
that approximately preserves the size of all cuts in the original graph. The central idea underlying our result 
is to show that there exists a collection of core witness sets that can be identified in an almost one-one 
manner with cuts in the graph such that the probability mass of edges in each witness set is comparable to 
the probability mass of the edges in the cut identified with it. Further, every witness set in the graph has a 
"representative" in this collection of core witness sets. Informally, this allows us to employ cut-preserving 
sampling schemes such as Benczur-Karger as "witness-preserving" schemes. We note here that the natural 
mapping which assigns the witness set of a pair {A, B) to the cut edges associated with this pair can map 
arbitrarily many witness sets to the same cut and is not useful for our purposes. One of our contributions is 
an uncrossing theorem for witness sets, that we refer to as the proportionate uncrossing theorem. Informally 
speaking, it says that given any collection of witness sets TZ such that the probability mass of each witness 
set is comparable to that of its associated cut, there exists another collection T of witness sets such that (i) 
the natural mapping to cuts as defined above is half-injective for T, that is, at most two witness sets in T 
map to any given cut, (ii) the probability mass of each witness set is comparable to the probability mass of 
its associated cut, and (iii) any subset of edges that hits every witness set in T also hits every witness set in 
TZ. The collection T is referred to as a proportional uncrossing of TZ. As shown in Figure [TJa), we can not 
achieve an injective mapping, and hence the half-injectivity is unavoidable. 

We believe the half-injective correspondence between witness sets and cuts, as facilitated by the propor- 
tionate uncrossing theorem, is of independent interest, and will perhaps have other applications in this space 
of problems. We also emphasize here that the uncrossing theorem holds for all bipartite graphs, and not only 
regular bipartite graphs. Indeed, the graph G' on which we invoke this theorem does not inherit the regular- 
ity property of the original graph G. As another illustrative example, consider the celebrated Birkhoff-von 
Neumann theorem |l4l[T3 which says that every doubly stochastic matrix can be expressed as a convex combi- 
nation of permutation matrices (i.e., perfect matchings). In some applications, it is of interest to do an iterative 
decomposition whereby a single matching is recovered in each iteration. The best-known bound for this prob- 
lem, to our knowledge, is an 0{mh) time algorithm that follows from the work of Gabow and Kariv Q; here 
h denotes the maximum number of bits needed to express any entry in M. The following theorem is an easy 
consequence of our proportionate uncrossing result. 

Theorem 1.2 Given an n x n doubly-stochastic matrix M with m non-zero entries, one can find a perfect 
matching in the support of M in 0{m + n^'^) expected time. 

The proof of this theorem and a discussion of known results about this problem are given in section |6] 
Though this result itself represents only a modest improvement over the earlier 0{mh) running time, it is an 
instructive illustration of the utility of the proportionate uncrossing theorem. 

It is worth noting that while the analysis of Goel, Kapralov, and Khanna was along broadly similar lines 
(sample edges from the original graph, followed by running the Hopcroft-Karp algorithm), the proportionate 
uncrossing theorem developed in this paper requires significant new ideas and is crucial to incorporating the 
non-uniform sampling stage into our algorithm. Further, the running time of the Hopcroft-Karp algorithm 
is easily seen to be Q{my/n) even for the 2-regular graph consisting of Q{y/n) disjoint cycles of lengths 
2,4,..., y/n respectively; the stronger analysis for our special case requires both our uncrossing theorem as 
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well as a stronger decompositioij^ As a step in this analysis, we prove the independently interesting fact that 
after sampling edges from a d-regular bipartite graph with rate , for some suitable constant c, we obtain a 
graph that has a matching of size n — 0{n/d) whp and such a matching can be found in 0{n/d) augmenting 
phases of the Hopcroft-Karp algorithm whp. 



Organization: Section[2]reviews and presents some useful corollaries of relevant earlier work. In Section[3j 
we establish the proportionate uncrossing theorem. In section |4j we present and analyze our two-stage sam- 
pling scheme, and section[5]outlines the stronger analysis of the Hopcroft-Karp algorithm for our special case. 
Section [6] contains the proof of Theorem 1.2 and a discussion of known results on finding perfect matchings 
in the support of double stochastic matrices. 



2 Preliminaries 

In this section, we adapt and present recent results of Goel, Kapralov, and Khanna t8] as well as the Bencziir- 
Karger sampling theorem 13 for our purposes, and also prove a simple technical lemma for later use. 

2.1 Bipartite Decompositions and Relevant Witness Pairs 

Let G = (P, Q,E)he.n regular bipartite graph, with vertex set P U Q and edge set E <^ P x Q. Consider any 
partition of P into k sets Pi, P2, . . . , Pk, and a partition of Q into Qi, Q2, . . . , Qk- Let Gi denote the (not 
necessarily regular) bipartite graph (Pj, Qi, Ei) where Ei = En{Pi xQi). We will call this a "decomposition" 
of G. 

Given A C. P and B C Q, define the witness set corresponding to the pair (^4, B), denoted B), as 

the set of all edges between A and Q\ B, and define the cut C(A, B) as the set of all edges between Au B 
and (P \ ^) U (Q \ P)- The rest of the definitions in this section are with respect to some arbitrary but fixed 
decomposition of G. 

Definition 2.1 An edge (u, v) €z E is relevant if (u, v) € Eifor some i. 

Definition 2.2 Let En be the set of all relevant edges. A pair {A, B) is said to be relevant if 

1. A'^ Pi and B C Qifor some i, 

2. \A\ > \B\, and 

3. There does not exist another A' £ Pi, B' G Qi, such that A' C A, \A'\ > \B'\, and W{A', B') DErO 
W{A, B) n Er. 

Informally, a relevant pair is one which is contained completely within a single piece in the decomposition, 
and is "minimal" with respect to that piece. The following lemma is implicit in [ 8 1 and is proved in appendix[A| 
for completeness. 

Lemma 2.3 Let TZ denote all relevant pairs (A, B) with respect to a decomposition of G{P, Q, E), and let 
Er denote all relevant edges. Consider any graph G* = {P,Q,E*). If for all (A,B) £ TZ, we have 
W{A, B) f] E* f] Er 7^ (p, then G* has a perfect matching. 

'it is known that the Hopcroft-Karp algorithm terminates quiclcly on bipartite expanders L14J . but those techniques don't help in 
our setting since we start with an arbitrary regular bipartite graph. 
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2.2 A Corollary of Benczur-Karger Sampling Scheme 

The Benczur-Karger sampling theorem [2 shows that for any graph, a relatively small non-uniform edge 
sampling rate suffices to ensure that every cut in the graph is hit by the sampled edges (i.e. it has a non-empty 
intersection) with high probability. The sampling rate used for each edge e inversely depends on its strength, 
as defined below. 

Definition 2.4 A k-strong component of a graph H is a maximal vertex-induced subgraph of H with 
edge-connectivity k. The strength of an edge e in a graph H is the maximum value ofk such that a k-strong 
component contains e. 

Definition 2.5 Given a graph H = {V, E), let ffy] = {V, ^[j]) denote the subgraph of H restricted to edges 
of strength j or higher, where j is some integer in {1,2, ... ,\V\}. 

It is easy to see that whenever a cut in a graph H{V, E) contains an edge of strength k, then the cut must 
contain at least k edges. Furthermore, for any 1 < j < \V\, each connected component of graph H^j^ is 
contained inside some connected component of Hy_i-^. The Benczur-Karger theorem utilizes these properties 
to show that it suffices to sample each edge e with probability 0(min{l, In n/se}). 

We now extend this sampling result to any collection of edge-sets for which there exists an injection (one- 



one mapping) to cuts of comparable inverse strengths. The statement of our theorem 2.6 closely mirrors the 
Benczur-Karger sampling theorem, and the proof is also along the same general lines. However, the proof 
does not follow from the Benczur-Karger sampling theorem in a black-box fashion, so a proof is provided in 
appendix [B| 

Tlieorem 2.6 Let H(y, E) be any graph on n vertices, and let C denote the set of all possible edge cuts in 
H, and 7 G (0, 1] be a constant. Let H' be a subgraph of H obtained by sampling each edge e in H with 

probability pe = min |l, , where Se denotes the strength of edge e, and c is a suitably large constant. 

Further, let X be a collection of subset of edges, and let f be a one-one (not necessarily onto) mapping from 
X to C satisfying Yleex ^/ ^e. > 7 YIeefiX) ^ I ^efor all X £ X. Then 



Pr [No edge in X is chosen in H'] < — ^ . 
The result below from 121 bounds the number of edges chosen by the sampling in Theorem |2. 6 



Tlieorem 2.7 Let H(y, E) be any graph on n vertices, and let H' be a subgraph of H obtained by sampling 

each edge e in H with probability pe = min 1 1 , | , where Se denotes the strength of edge e, and c is 

any constant. Then with probability at least 1 — the graph H' contains at most c'n In n edges, where d is 
another suitably large constant. 

We conclude with a simple property of integer multisets that we will use later. A similar statement was 
used in ifTTIl (lemma 4.5). A proof is provided in appendix |C] for completeness. 

Lemma 2.8 Let S\ and S2 be two arbitrary multisets of positive integers such that \Si\ > ^\S2\ for some 
7 > 0. Then there exists an integer j such that 

i>j and and i(iS2 
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3 Proportionate Uncrossing of Witness Sets 



Consider a bipartite graph G = {P,Q,E), with a non-negative weight function t defined on the edges. 
Assume further that we are given a set of "relevant edges" Er C E. We can extend the definition of t to sets 
of edges, so that t{S) = X]ee5 ^(^)' where S Q E. 

Definition 3.1 For any 7 > and A (1 P^B Q Q, the pair {A, B) is said to be ^-thick with respect to 
{G,t,ER) ift(W{A,B) n Ej^) > 'yt{C{A,B)), i.e., the total weight of the relevant edges in W{A,B) is 
strictly more than 7 times the total weight of C{ A, B). A set of pairs TZ = {{Ai,Bi), {A2, B2), ■ ■ ■ , {Ak, Bk)} 
where each C P and each Bi <^ Q is said to be a j-thick collection with respect to (G, t, En) if every pair 
[Ai, Bi) G TZ is j-thick. 

The quantities G, t, and Er will be fixed for this section, and for brevity, we will omit the phrase "with 
respect to (G, t, Er)" in the rest of this section. 

Before defining proportionate uncrossings of witness sets, we will informally point out the motivation for 
doing so. If a pair (^4, B) is 7-thick for some constant 7, and if we know that a sampling process where edge 
e is chosen with probability t chooses some edge from C{A, B) with high probability, then increasing the 
sampling probability by a factor of I/7 should result in some relevant edge from B) being chosen with 

high probability as well, a fact that would be very useful in the rest of this paper. The sampling sub-routines 
that we employ in the rest of this paper are analyzed by using union-bound over all cuts, and in order to apply 
the same union bound, it would be useful if each witness set were to correspond to a unique cut. However, 
in figure[TJa), we show two pairs {A, B) and (X, Y) which are both (l/2)-thick but correspond to the same 
cut; we call this a "crossing" of the pairs {A, B) and (X, Y), drawing intuition from the figure. In general, 
we can have many witness sets that map to the same cut. We would like to "uncross" these witness sets 
by finding subsets of each witness set that map to unique cuts, but there is no way to uncross figure [TJa) in 
this fashion. Fortunately, and somewhat surprisingly, this is the worst case: any collection of 7-thick pairs 
can be uncrossed into another collection such that all the pairs in the new collection are also 7-thick (hence 
the term proportionate uncrossing), every original witness set has a representative in this new collection, and 
no more than two new pairs have the same cut. Figure [l|b) shows two | -thick pairs that can be uncrossed 
using a single ^ -thick representative, {Ar\ X,B r\Y). We will spend the rest of this section formalizing the 
notion of proportionate uncrossings and proving their existence. The uncrossing process is algorithmically 
inefficient, but we only need to demonstrate existence for the purpose of this paper. The arguments in this 
section represent the primary technical contribution of this paper; these arguments apply to bipartite graphs 
in general (not necessarily regular), and may be independently interesting. 



p Q P Q 




(a) (b) 

Figure 1: Both (a) and (b) depict two |-thick pairs [A, B) and (X, Y) that have different witness sets but 
the same cut (i.e. W{A, B) / W{X, Y) but G[A, B) = C{X, Y)). The pairs in (a) can not be uncrossed, 
whereas the pairs in (b) can be uncrossed by choosing the single pair (j4 n X, i? n y) as a representative. 
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3.1 Proportionate Uncrossings: Definitions and Properties 

Definition 3.2 A ^-uncrossing of a j-thick collection TZ is another j-thick collection of pairs T that satisfies 
the three properties below: 

PI: For every pair {A,B) G Tl there exists a pair {A',B') G T such that C{A',B') C C{A,B), and 
W{A', B') C W{A, B). We will refer to {A', B') as a representative of {A, B). 

P2 For every {A', B') G T, there exists {A, B) eU such that C{A', B') C C{A, B). 

P3: (Half-injectivity): There can not be three distinct pairs {A, B), {A', B'), and {A" , B") in T such that 
C{A, B) = C{A', B') = C{A", B"). 

Since T has the same (or larger) thickness as the thickness guarantee that we had for TZ, it seems appro- 
priate to refer to T as a proportionate uncrossing of Tl. 

Definition 3.3 A ^-partial-uncrossing of a ^-thick collection TZ is another j-thick collection of pairs T which 
satisfies properties P1,P2 above but not necessarily P3. 

The following three lemmas follow immediately from the two definitions above, and it will be useful to 
state them exphcitly. Informally, the first says that every collection is its own partial uncrossing, the second 
says that uncrossings can be composed, and the third says that the union of the partial uncrossings of two 
collections is a partial uncrossing of the union of the collections. 

Lemma 3.4 IfTZis a j-thick collection, then Tl is a 'j -partial uncrossing of itself 

Lemma 3.5 IfS is a ^-partial uncrossing of a j-thick collection H, and T is a ^-uncrossing ofS, then T is 
also a '^-uncrossing ofTl. 

Lemma 3.6 If Hi and TI2 are two ^-thick connections, 71 is a '^-partial-uncrossing of Tl\, and T2 is a 
'y-partial-uncrossing of 112, then T1UT2 is a ^-partial-uncrossing ofTl\ U H2. 

3.2 Proportionate Uncrossings: An Existence Theorem 

The main technical result of this section is the following: 

Theorem 3.7 For every j-thick collection H, there exists a ^-uncrossing ofH. 

The proof is via induction over the "largest cut" corresponding to any pair in the collection Tl; each inductive 
step "uncrosses" the witness sets which corresponds to this largest cut. Before proving this theorem, we need 
to provide several useful definitions and also establish a key lemma. 

Define some total ordering -< over all subsets of E which respects set cardinality, so that if < 
\E2\ then Ei -< E2. Overload notation to use C(7^) to denote the set of cuts {C{A, B) : {A, B) G Tl]. 
Analogously, use W{Tl) to denote the set of witness sets corresponding to pairs in Tl. Since C{A, B) may be 
equal to C{A', B') for {A, B) / {A', B'), it is possible that |C(7^)| may be smaller than \Tl\. In fact, if 7^ 
and I C {Tl) \ are equal, then Tl is its own 7-uncrossing and the theorem is trivially true. Similarly, it is possible 
that W{A,B) is equal to W{A' ,B') for two different pairs {A,B) and {A' ,B') in Tl. However, suppose 
W{A, B) = W{A', B') and C{A, B) = C(A', B') for two different pairs (A, B) and (A', B') in Tl. In this 
case, we can remove one of the two pairs from the collection to obtain a new collection Tl'; it is easy to see 
that a 7-uncrossing of Tl' is also a 7-uncrossing of Tl. So we will assume without loss of generality that for 
any two pairs {A, B) and {A', B') in 7^, either W{A, B) ^ W{A', B') or C{A, B) + C(A', B'); we will call 
this the non-redundancy assumption. 



6 



We will now prove a key lemma which contains the meat of the uncrossing argument. When we use this 




lemma later in the proof of theorem 3.7 we will only use the fact that there exists a 7-partial-uncrossing of TZ, 
where TZ satisfies the preconditions of the lemma. However, the stronger claim of existence of a 7-uncrossing 
does not require much additional work and appears to be an interesting graph theoretic argument in its own 
right, so we prove this stronger claim. 

Lemma 3.8 IfTZisa j-thick collection such that \TZ\ > 2, TZ satisfies the non-redundancy assumption, and 
C{TZ) contains a single set S, then there exists a ^-uncrossing T of TZ. Further, for every pair (A, B) G T, 
we have C{A, B) C S. 

Proof: Let TZ = {[Ai.Bi), ^2), • • • , (^j, Bj)}. Since C[Ai, Bi) = S for all i, we know by the non- 
redundancy assumption that W{Ai, Bi) / W{Aii, Bii) for i / i' . We break the proof down into multiple 
stages. 

1. Definition of Venn witnesses and Venn cuts. For any J-dimensional bit-vector h G {0, l}'^, define 
A{b) = LP n \ IJ , and similarly, 5(b) = 



We overload notation and use W^p-^ to denote the witness set W{A{p-^, ^C^) ^'^'^ ^{b) to denote the cut 
set C(A((,) , A node u belongs to A^y^ if it is in every set Ai such that b-i = 1 and not in any of the 

sets Ai for which hi = 0. Thus, each ^(f,) corresponds to one of the regions in the Venn diagram of the 
sets Ai, A2, ■ ■ ■ , Aj, and the analogous statement holds for each -B(b). Hence, we will refer to the sets 
and C(j,) as the Venn- witness and the Venn-cut for b, respectively, and refer to the pair (^(6) , ^(6)) 
as a Venn pair. Also, we will use b to refer to a vector which differs from b in every bit. 

2. The special structure of Venn witnesses and Venn cuts. Consider an edge (u, v) that goes out of A(^i,y 
Suppose that edge goes to where d ^ h and d ^b. Then there must exist 1 < < J such that 
bi = di and bit / dit. Since bi = di, either u ^ Ai,v ^ Bi {if bi = di = 1) or u ^ Ai,v ^ Bi (if 
hi = di = 0). In either case the edge {u, v) does not belong to the cut C{Ai, Bi), and since all pairs 
in TZ have the same cut S, we conclude that {u,v) S. On the other hand, since 6j/ 7^ dj/, either 
M G ylj', w Bii (if 6j/ = 1, di' = 0) or u Aj/, t; G Bii (if 6j/ = 0, dj/ = 1). In either case the edge 
(n, v) belongs to the cut C{Aii, Bi') and hence to S, which is a contradiction. Thus, any edge from 
goes to either or B^-^ . 

If the edge {u, v) goes to i?(b) then it does not belong to any witness set in W{TZ), any Venn witness 
set, any Venn cut, or S. If (n, v) goes to Bfj^^ then it belongs to S, to the Venn witness set W(py to 
the Venn cuts C(fe) and C^j^ , and to no other Venn witness set or Venn cut. This edge also belongs to 
W{Ai, Bi) for all i such that hi = 1. These observations, and the definitions of Venn witnesses, cuts, 
and pairs easily lead to the following consequences: 

w^(b)nw^(rf) = 0if6/d, (1) 

W{Ai,B,)= I) W(^b), (2) 

be{0,l}-':bi=l 

^{b) = C'(b) , (3) 

C7(fe) n C(d) = if 6 / d and 6 / d, (4) 
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(Vf,l<i< J):5= U ^W' 

6e{0,l}'f;fei = l 

and finally, 

^WUW(fe) =^6)- (6) 

3. The collection T. Define T to consist of all 7-thick Venn pairs (^(f,), ^(6)) where b is not the all zero 
vector. 

4. Proving that T w a '^-uncrossing ofTZ. (PI): Fix some i,l < i < J. Since 7?, is a 7-thick collection, it 
follows from the definition that {Ai, Bi) must be a 7-thick pair. From equations [2] and [T| we know that 
t{W{Ai,Bi)r\ER) = X;f,g{o,i}J:6,=i*(W^(6)n^i?). We also know, from equations|4]and[5j that t(S') = 

Sfcejo i}-'-bi=i Hence, there must be some b e {0, 1}"^ such that bi = 1 and is 

7-thick, which in turn implies that (^(f,), is in T. This is the representative of {Ai, Bi) and hence 
T satisfies PI. (P2): This follows trivially from equation [5] (P3): From equation |4] we know that 
there are only two possible Venn pairs (specifically, ^(f,)) and {A^^^, ^(b))^ ^^^^ have the same 

non-empty cut C(6). Observe that our definition of 7-thickness involves "strict inequality", and hence 
Venn pairs where the Venn witness set and the Venn cut are both empty can't be 7-thick and can't be in 

r. 

5. Proving that C{X, Y) C S for all pairs {X, Y) E T. Any cut C{A, B) E C{T) is of the form C(f,) 
for some J-dimensional bit vector b. Each C(j) C S, from equation [5] We will now show that this 
containment is strict. Suppose not, i.e., there exists some C(f,) = S. By equation j3j C^-^ = as well. 

Since J > 2, either 6 or 6 must have two bits that are set to 1; without loss of generality, assume that 
61 = 62 = 1. From equations [T] and [6j we know that C(f,) (and hence S) is the disjoint union of VF(f,) 
and W^y Any edge in VF^^^ must belong to both Bi) and W{A2, B2), whereas any edge in 

Wfi^^ can not belong to either W{Ai,Bi) or 1^(^2,52). Hence, W{Ai,Bi) = W{A2, B2) = VF(b) 
which contradicts the non-redundancy assumption on TZ. Therefore, we must have C(fe) <Z S. ■ 



Proof of Theorem 3.7 : The proof will be by induction over the largest set in C (TZ) according to the ordering 
-<. Let M{TZ) denote this largest set. 

For the base case, suppose M{TZ) is the smallest set S under the ordering -<. Then S must be singleton, 
C{TVj must have just a single set S, and W{TZ) must also have a single witness set, which must be the same 
as S since TZ is 7-thick. By the non-redundancy assumption, TZ must have at most one pair, and is its own 
7-uncrossing. 

For the inductive step, consider any possible cut S and assume that the theorem is true when M{TZ) -< S. 
We will show that the theorem is also true when M{TZ) = S, which will complete the inductive proof. 

Suppose there is a unique {A, B) ^ IZ such that C{A, B) = S. Intuitively, one would expect this to be 
the easy case, since there is no "uncrossing" to be done for S, and indeed, this case is quite straightforward. 
Define IZ' = IZ — {A, B). Let T' denote a 7-uncrossing of IZ, which is guaranteed to exist by the inductive 
hypothesis. Since T' is 7-thick, so is T = T' U {(^4, B)}. The pair (A, B) clearly has a representative in 
T (itself), and any {A' , B') £ TZ — {A, B) has a representative in T' and hence also in T. Thus, T satisfies 
property PI for being a 7-uncrossing of TZ. Every set in C{T') is a subset of some cut in C{TZ') (by property 
P2) and C{A, B) is also in C{TZ), and hence T satisfies property P2 for being a 7-uncrossing of TZ. Every set 
in T' is smaller than C(A, B) according to -< and T' satisfies property P3. Hence, T also satisfies property 
P3. Thus, T is a 7-uncrossing of TZ. If there are exactly two distinct pairs (A, B) and {A' , B') in TZ such that 
C{A, B) = C[A', B') = S, then the same argument works again, except that TZ' = TZ \ {{A, B), {A', B')} 
andT = TU{{A,B),{A',B')}. 

We now need to tackle the most interesting case of the inductive step, where there are more than two pairs 
in TZ that correspond to the same cut S. Write TZ = TZ1IJTZ2 where C[A, B) < S for all (A, B) E TZi 
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and C{A, B) = S for all {A, B) G 7^2. Recall that for two different pairs {A, B) and {A\ B') in 7^2, we 
must have VF(yl, B) / W{A', B') by the non-redundancy assumption. From lemma [3^ there exists a 7- 
partial-uncrossing, say 52, of 7^.2 with the property that for every set S' G C{S2), we have S' C S, and hence 



S' -< S. By lemma 3.4 we know that TZi is its own 7-partial-uncrossing. Further, by definition of TZi, every 



set S' G CiJZi) must satisfy 5' -< S. Define S = TZi U 52. By lemma 3.6 5 is a 7-partial-uncrossing of 



TZi U 7^2, i-^-, of 7^. Further, for every cut S' G C{S), we have 5' -< S. Hence, by our inductive hypothesis. 



there exists a 7-uncrossing of S; let T be a 7-uncrossing of S. By lemma 3.5 T is also a 7-uncrossing of TZ, 
which completes the inductive proof. ■ 

Remark 3.9 An alternate approach to relating cuts and witness sets is to suitably modify the proof of the 
Benczur-Karger sampling theorem, circumventing the need for the proportionate uncrossing theorem. The 
idea is based on the observation that Karger's sampling theorem also holds for vertex cuts in graphs. Since 
Benczur-Karger sampling theorem is proved using multiple invocations of Karger's sampling theorem, it is 
possible to set up a correspondence between cuts and witness sets using a vertex-cut version of the Benczur- 
Karger sampling theorem. However, we prefer to use here the approach based on the proportionate uncrossing 
theorem as it is an interesting combinatorial statement in its own right. 

4 An 0{n^-^) Time Algorithm for Finding a Perfect Matching 

We present here an 0(n^^) time randomized algorithm to find a perfect matching in a given d-regular bipartite 
graph G{P, Q, E) on 2n vertices. Throughout this section, we follow the convention that for any pair {A, B), 
the sets C{A, B) and B) are defined with respect to the graph G. Our starting point is the following 

theorem, established by Goel, Kapralov, and Khanna [ 8 1^ 

Theorem 4.1 Let G{P, Q, E) be a d-regular bipartite graph, e any number in (0, and c a suitably large 
constant that depends on e. There exists a decomposition of G into k = 0{n/d) vertex-disjoint bipartite 
graphs, say Gi = {Pi,Qi, Ei),G2 = {P2, Q2, E2), . . . ,Gk = {Pk, Qk, Ek), such that 

1. Each Gi contains at least d/2 perfect matchings, and the minimum cut in each Gi is i7((i^/n). 

2. Let TZ denote the set of relevant pairs with respect to this decomposition, and Ej^ denote the set of 
relevant edges. Thenforeach (A, B) inlZ, we have \W{A,B) Pi > \\G{A,B)\. 



3. Let G'{P, Q, E') be a random graph generated by sampling the edges of G uniformly at random with 

n In ' 
"52- 



probability p = '^"1° Then with probability at least 1 — 1/n, for every pair {A, B) G IZ, 



\W{A,B)r^E'r^ER\>{l-e)p\w{A,B)r^EI,\ > (^^^^) \C{A,B)r^E'\. 

The last condition above says that in addition to all cuts, all relevant witness edge sets are also preserved to 
within (1 lb e) of their expected value in G' , with high probability. We emphasize here that the decomposition 
highlighted in Theorem |4.1| will be used only in the analysis of our algorithm; the algorithm itself is oblivious 
to this decomposition. 

Our algorithm consists of the following three steps. 

(SI) Generate a random graph G' = (P, Q, E') by sam pling edges of G uniformly at random with probability 
p = '^^'^2"" where ci is a constant as in Theorem 4. 1 ' We choose e to be any fixed constant not larger 



than 0.2. 



4.1 



corresponds to theorem 2.3 in [8], part 2 is proved as part of the proof of theorem 2.1 in [8], and part 3 
combines remark 2.5 in 1 8| with Karger's sampling theorem f 101. 

^The time required for this sampling is proportional to the number of edges chosen, assuming the graph is presented in an 
adjacency list representation with each list stored in an array. 



9 



(S2) The graph G' contains 0(^^-^) ed ges w.h.p. We now run the Bencziir-Karger sampUng algorithm 121 
that takes 0(|i?'| In^ n) time to compute the strength Se of every edge e, and samples each edge e with 
probability pgnhere pe is as given by Theorem 2.6 with 7 = 1/3. We show below that w.h.p. the graph 



G = (P, Q, E ) obtained from this sampling contains a perfect matching. 

(S3) Finally, we run the Hopcroft-Karp algo rithm to obtain a maximum cardinality matching in G" in 
0{n^'^ In n) time since by Theorem 2.7 G contains 0(n In n) edges w.h.p. 



Running time: With high probability, the running time of this algorithm is bounded by 0(^ln^n + 
n^'^lnn). Since we can always use the algorithm of Cole, Ost, and Schirra [6| instead, the final running 
time is 0(min{m, ^ In^ n + n^-^lnn}). This reduces to 0(m) if d < y^lnn; to O(n^-^lnn) when 



d > y/nln^ n; and to at most 0((n In n)^'^) in the narrow range y/nlnn < d < ^/nln 



Correctness: To prove correctness, we need to show that G contains a perfect matching w.h.p. 
Theorem 4.2 The graph G" contains a perfect matching with probability 1 — 0(1 /n). 



Proof: Consider the decomposition defined in Theorem 4. 1 Let IZ denote the set of relevant pairs with respect 



to this decomposition, and let Er denote the set of all relevant edges with respect to this decomposition. We 
will now focus on proving that, with high probability, for every {A, B) G TZ, W{A, B) n Er Pi E / 0; by 



Lemma 2.3 this is sufficient to prove the theorem. 

For convenience, define W'{A, B) = W{A, B) n E' and G'{A, B) = C{A, B) n E' . Assume for now 
that the low-probability event in Theorem |4.1| does not occur. Thus, by choosing e < 0.2, we know that for 
7 = 1/3, every relevant pair [A, B) e TZ satisfies \W'{A, B) n Er\ > -f\G'{A, B)\. 

Let Sg denote the strength of e in G'. Recall that Gj^.j = {V,Ey-^) is the graph with the same vertex 
set as G' but consisting of only those edges in E' which have strength at least j. Define Wlj^{A,B) to 
be the set of all edges in W'{A,B) n E'y^; define Gy^{A,B) analogously. Define t(e) = 1/Sg. Since 



\W'{A,B)nER\ > 7|G'(yl,P)|, by Lemma 2.8 there must exist a j such that 



eG{W'iA,B)nEii),s'^>j ^ e<^C'{A,B),s'^>j ^ 



3.1 



Partition TZ into 



which implies that {A, B) is 7-thick with respect to (Gj^j , t, Er), as defined in Definition 
1Z[i\, ^[2]i • • • ) ^[n]' such that if {A, B) G 7^^] then {A, B) is 7-thick with respect to (Gj^.j , t, Er), breaking 
ties arbitrarily if {A, B) can belong to multiple TZ[j\- Consider an arbitrary non-empty T^yj. Let T represent 



a 7-uncrossing of TZyj^, as guaranteed by Theorem 3.7 By property P3, no three pairs in a 7-uncrossing can 
have the same cut; partition T into 7i and T2 such that every pair {A, B) ^Ti has a unique cut Gj^j (A, B) and 
the same holds for T2. We focus on Ti for now. For any (A, B) £ Ti, define Y{A, B) = W!^^{A, B) n Er. 
Define X = {Y{A,B) : {A,B) € Ti}. For any X e X, define f{X) = C'y^{A,B) for some arbitrary 
{A, B) G Ti such that X = Y{A, B). The function / is one-one by construction, and since {A^ B) is 7-thick, 



we know that X^eex '^/^'e > 7X^eG/(X) ^/ ^'e- Thus, X satisfies the preconditions of Theorem 2.6 Further, 
the sampling probability Pf. in step (S2) of the algorithm is chosen to correspond to 7 = 1/3. Thus, with 
probability at least 1 - l/n^, X n E" is non-empty for all X ^ X, i.e., W^.^{A, B) n Er r\ E" % for 

all {A, B) G Ti. Since Gj^j is a subgraph of G', we can conclude that W'{A, B) n Er n E" / for all 
{A, B) G Ti with probability at least 1-1 /-n?. 



''in fact, this sampling algorithm computes an upper bound on s^, but this only affects the running time and the number of edges 
sampled by a constant factor. 
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Since the analogous argument holds for %, we obtain W{A, B) D ErD E" / for all {A, B) £ T 
with probability at least 1 — Since T is a 7-uncrossing of ^[j], we use property PI to conclude that 

W'{A, B)nERnE / for all {A, B) G Uy^ , again with probability at least 1 — 2/v?. Applying the union 
bound over all j, we further conclude that W {A, B) r\ ERr\ E" / for all (yl, i?) ^TZ with probability at 
least 1 — 2/n. As mentioned before, this suffices to prove that G has a perfect matching with probability at 
least 1 — 2/n, by Lemma 2.3 We assumed that condition 3 in theorem 4.1 is satisfied; this is violated with 
probability at most ^, which proves that G has a perfect matching with probability at least 1 — f ■ ■ 

As presented above, the algorithm takes time min{0(n^-^), 0(m)} with high probability, and outputs a 
perfect matching with probability 1 — 0(l/n). We conclude with two simple observations. First, it is easy to 
convert this into a Monte Carlo algorithm with a worst case running-time of min{0(n^-^), 0(m)}, or a Las 
Vegas algorithm with an expected running-time of min{0(n^'^), 0(m)}. If either the sampling process in 
steps (SI) or (S2) returns too many edges, or step (S3) does not produce a perfect matching, then (a) abort the 
computation to get a Monte Carlo algorithm, or (b) run the 0{m) time algorithm of Cole, Ost, and Schirra |6l 
to get a Las Vegas algorithm. Second, by choosing larger constants during steps (SI) and (S2), it is easy to 
amplify the success probability to be at least 1 — 0(i) for any fixed j > 1. 



5 An Improved O (min{n(i, (n^ In^ 'n)/d}) Bound on the Runtime 

In this section we give an improved analysis of the runtime of the Hopcroft-Karp algorithm on the subsampled 
graph, ultimately leading to a bound of O (min {nd, {v?\ii^n)/d}) for our algorithm. The main ingredients 
of our analysis are (1) a decomposition of the graph G into 0{n/d) vertex-disjoint -edge-connected 
subgraphs, (2) a modification of the uncrossing argument that reveals properties of sufficiently unbalanced 
witness sets in the sampled graph obtained in step S2, and (3) an upper bound on length of the shortest 
augmentating path in the sampled graph relative to any matching of size smaller than n — 2n/d. 



5.1 Combinatorial uncrossings 



Theorem 5.2 below, which we state for general bipartite graphs, requires a variant of the uncrossing theorem 
that we formulate now. We introduce the definition of combinatorial uncrossings: 

Definition 5.1 Let TZ be any collection of pairs {A, B), A C P, B Q. A combinatorial uncrossing ofTZ 
is a tuple (T,I), where T is another collection and I is a mapping from TZ to subsets ofT, such that the 
following properties are satisfied: 

Ql: For all {A, B) G TZ 

1. {W{A', 5')}(A',B')6X(A,B) "'■^ disjoint; 

2. {C(^',5')}(^/^B,)g2:(^^s) are disjoint; 

3. {A' U -B'}^^, B')eT{A B) '^^^ disjoint; 

4. A' ^A,B'C Bfor all {A', B') G I{A, B); 
5. 

W{A,B)= y W{A',B') 

(A' ,B')€I{A,B) 

G{A,B)= IJ C{A',B'). 

{A',B')£l{A,B) 
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Q2: (Half-injectivity) There cannot be three distinct pairs (A, B), {A\ B'), {A" , B") in T such that C {A, B) = 
C{A',B') = C{A",B"). 

The proof of existence of combinatorial uncrossings is along the lines of the proof of existence of 7-thick 
uncrossings, so we omit it here. 

For a graph H we denote Wh{A, B) = W{A, B) n E{H) and Ch{A, B) = C{A, B) n E{H), and omit 
the subscript when the underlying graph is fixed. 

Theorem 5.2 Let G* be a graph obtained by sampling edges uniformly at random with probability p from 
a bipartite graph G = (P, Q, E) on 2n vertices with a minimum cut of size k. Then there exists a constant 
c > such that for all e > ifp > then w.h.p. for all A P, and B Q Q, we have 

p\ Wg{A, B) I - tp\ Cg{A, B)\<\ Wg* {A, B)\<p\ Wg{A, B) \ + ep\ Cg{A,B)\. 

Proof: Define IZ as the set of pairs [A,B),A C P n V[G),B C Q n V{G). Denote a combinatorial 
uncrossing of IZ by (T,X). We first prove the statement for pairs from T, and then extend it to pairs from TZ 
to obtain the desired result. 

Consider a pair G T. Denote Ag(^, = \Wg''{A, B)\-p\Wg{A, B)\. We shall write VF(^, fi) 

and C{A, B) instead of Wg{A, B) and Cg{A, B) in what follows for brevity. We have by Chemoff bounds 
that for a given pair {A, B) € T 



Pr[\AG{A,B)\>ep\C{A,B)\]<exp 



f e\C{A,B)\ y p\W{A,B) 
{\W{A,B)\J 2 

^fp\C{A,B)\ 



< 



exp 



-e 



since \C{A, B)\ > \ W{A, B)\. Since T satisfies Q2, we get that 

Pr[3iA,B)eT:\AG{A,B)\>ep\CiA,B)\] 

< [-eMCiA,B)\/2] < 2 [-e'p\CiA,B)\/2] = ©(n^^) 

W{A,B)eW{T) C(A,B)eC{T) 

for c = 2(r + 2) by Corollary 2.4 in ifTOl . This implies that for c > 2(r + 2) we have with probability 

1 - 0(n-'') for all {A, B) e T 

\AG{A,B)\<ep\C{A,B)\. (7) 

Now consider any pair {A, B) G TZ. Summing ([7]) over all {A',B') G T{A,B) and using properties 
Ql.1-5, we get 

\Ag[A,B)\ < Yl ep\C{A',B')\=ep\C{A,B)\, 
(A' ,B')&I{A,B) 

for all {A, G 7^ as required. ■ 
5.2 Decomposition of the graph G 

Corollary |5.6[ which relates the size of sufficiently unbalanced witness sets in the sampled graph to the size 



of the corresponding cuts is the main result of this subsection. It follows from theorem 5.2 and a stronger 
(than lUl) decomposition of bipartite d-regular graphs that we outline now. 
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Theorem 5.3 Any d-regular graph G with 2n vertices can be decomposed into vertex-disjoint induced sub- 
graphs Gi = {Pi,Qi,Ei),G2 = (P2, Q2, -E'2), Gfc = {Pk,Qk,Ek), where k < 4n/d-\- 1, that satisfy 
the following properties: 

1. The minimum cut in each Gi is at least d/8. 

2. ESlfc(m))l<2n. 

To prove Theorem |5.3[ we give a procedure that decomposes the graph G into vertex-disjoint induced 
subgraphs Gi{Pi,Qi, Ei), ^2(^2, <52, -E'2), • • • , Gk{Pk,Qk, Ek), k < 4n/(i + 1 such that the min-cutin Gj 
is at least d/8 and at most n edges run between pieces of the decomposition. 

The procedure is as follows. Initialize Hi := G, and set i := 1. 

1. Find a smallest proper subset Xi C V{Hi) such that < d/4. If no such set exists, define Gi 
to be the graph Hi and terminate. 

2. Define Gi to be the subgraph of Hi induced by vertices in Xi, i.e. Xi = PiUQi = V{Gi). Also, define 

to be the graph Hi with vertices from Xi removed. 

3. Increment i and go to step 1. 

We now prove that the output of the decomposition procedure satisfies the properties claimed above. 
Lemma 5.4 The min-cut in Gi is greater than d/8. 

Proof: If Gi contains a single vertex the min-cut is infinite by definition, so we assume wlog that Gi contains 
at least two vertices. The proof is essentially the same as the proof of property PI of the decomposition 
procedure in [8| (see Theorem 2.4). 

Suppose that there exists a cut {V, in Gi where V C V{Gi) and V = V{Gi)\V, such that {6^ (V) \ < 
d/8 (note that it is possible that F n / and F n / 0). We have \6h,{V) \ 5g,(T^)| + \ 
^Giiy^)] < d/4 by the choice of Xi in (1). Suppose without loss of generality that \Sh,{V) \ SdiV)] < d/8. 
Then < d/4 and V C Xi, which contradicts the choice of Xi as the smallest cut of value at most 

d/4 in step (1) of the procedure. ■ 

Lemma 5.5 The number of steps in the decomposition procedure is k < 4n/d, and at most n edges are 
removed in the process. 

Proof: We call a vertex v € V{Gi) bad if its degree in Gi is smaller than d/2. Note that for each 1 < i < k 
either Gi contains a bad vertex or V{Gi) > d. 

Note that since strictly fewer than d/4 edges are removed in each iteration, the number of bad vertices 
created in the first j iterations is strictly less than j{d/4) / {d/2) = j/2. Hence, during at least half of the j 
iterations at least d vertices were removed from the graph, i.e. 

Y,\V{Gi)\>{j/2).d = jd/2. 

i=l 

This implies that the process terminates in at most 4n/d steps, and the number of edges removed is at most 
{4n/d) ■ d/4 = n. ■ 
Proof of Theorem |5.3| The proof follows by putting together lemmas \5A\ and [53] ■ 

We overload notation here by denoting W{B, A) = W{P \A,Q\B) = C{A, B) \ W{A, B) for 
A (1 P, B (1 Q. The main result of this subsection is 
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Corollary 5.6 Let G* = (P, Q, E*) be a graph obtained by sampling the edges of a d-regular bipartite graph 
G = {P, Q, E) on 2n vertices independently with probability p. There exists a constant c > such that if 
p > ^ then whpforall pairs {A, B),AC P,B C Q, \ A\ > \B\ + 2n/d one has that \W{A, B) n E*\ > 
Y^^\W{B, A) n E*\for all e < 1/4. In particular, G* contains a matching of size at least n — 2n/d whp. 

Proof: Set Ai = An Pi, Bi = BnQi, where G-i = {Pi, Qi,Ei) are the pieces of the decomposition obtained 



in Section 5.2 For each [Ai, Bi) such that Gi is not an isolated vertex we have by Lemma 5.4 and Theorem 



\\WGM^^B,)r^E*\ -p\WGM^^B,)\\ < ep\GGA^i, B,)\. 

If Gi is an isolated vertex, we have iVFcil^ii Bi) r\E*\ = p\WGi{Ai, Bi)\ = 0. Since the latter estimate is 
stronger than the former, we shall not consider the isolated vertices separately in what follows. 
Adding these inequalities over all i we get 

k k k 

Y,\WGAAuBi)r^E*\>pY,\WGAAi,Bi)\-epY,\CGM^,Bi)\. (8) 

1=1 i=l 1=1 

Denote the set of edges removed during the decomposition process by E^. Denote Ei = Ej. n Ty(A, B) and 
E2 = Err\W{B,A). Smct\W{A,B)nE*\ = Zi=i \WGMi, B^)nE*\+\ElnE'\ and ^ti \WGMi,B^)\ = 
\W{A,B) \ - 1^1 1, this implies 

\W{A,B)nE*\ > p\W{A,B)\ -ep\C{A,B)\ - p\Ei\. 
Likewise, since W{B, A) = W{P \A,Q\ B), we have 

\W{B,A)nE*\ <p\W{B,A)\+ep\C{A,B)\ +p\E2\. 

Since > \B\+2n/d, wehave \W{A, B)\ > \W{B,A)\+2n, so 

\W{A,B)nE*\ > p\W{A,B)\ -ep\C{A,B)\ - p\Ei\ 

>pi\W{B,A)\+2n)-ep\C{A,B)\ - p\Ei\ - p\E2\ 

> \W{B,A)nE*\-2ep\G{A,B)\+p{2n-\Er\) 

> \W{B, A) n E*\ - 2ep\G{A, B) \ + pn. 

By similar arguments I C( A, 5) n^* I > {l-e)p{\C{A, B)\-n),i.e. p\C{A, B)\ < j^JC{A, B)nE*\+pn. 
Hence, we have 

\W{A, B)nE*\> \W{B, .4) n - 2ep\C{A, B)\ + pn 

> \W{B, A)nE*\- :^^\G{A, B)nE*\ + {l- 2e)pn 



which imphes 



> \W(B,A)nE*\ - ^^(\W(A,B)nE*\ + \W(B,A)nE*\) + (l-2e)pn, 
1 — e 



\W{A,B)nE*\ > ^j-^\W{B,A)nE* 



for e < 1/4. This completes the proof. 



Remark 5.7 The result in corollary 5.6 is tight up to an 0(ln d) factor for d = ^{^/n) 
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Proof: The following construction gives a lower bound of n — 17 {d{td)- Denote by Gn,d the graph from 
Theorem 4. 1 in [ 8 1 and denote by G* ^ a graph obtained by sampling edges of G„ ^ at the rate of for 
a constant c > 0. Define the graph G as d disjoint copies of G2dind,d, and denote the sampled graph by 
G*. Note that by Theorem 4.1 the maximum matching in each copy of G^^^^ndd ^^^^ most 2dlnd — 1 
whp, and since the number of vertices in G is = 2(1? In d, the maximum matching in G* has size at most 
^-^(3TL)whp. ■ 

5.3 Runtime analysis of the Hopcroft-Karp algorithm 

In this section we derive a bound on the runtime of the Hopcroft-Karp algorithm on the subsampled graph 
obtained in step S2 of our algorithm. The main object of our analysis is the alternating level graph, which 
we now define. Given a partial matching of a graph G = (P, Q, E), the alternating level graph is defined 
inductively. Define sets Aj and Bj, j = 1, . . . , L as follows. Let Aq be the set of unmatched vertices in 

P and let Bq = 0. Then let = r(^j) \ (^Ui<j Bi^, where T{A) is the set of neighbours of vertices 

in ^ C V{G), and let Aj be the set of vertices matched to vertices from Bj. The construction terminates 
when either -Bj+i contains an unmatched vertex or when -Bj+i = 0, and then we set L = j. We use 
the notation ^(J) = {j^^-Ak,B^i) = Ufe<j^fe- We now give an outline of the Hopcroft-Karp algorithm 
for convenience of the reader. Given a non-maximum matching, the algorithm starts by constructing the 
alternating level graph described above and stops when an unmatched vertex is found. Then the algorithm 
finds a maximal set of vertex-disjoint augmenting paths of length L (this can be done by depth-first search 
in 0{m) time) and performs the augmentations, thus completing one augmentation phase. It can be shown 
that each augmentation phase increases the length of the shortest augmenting path. Standard analysis of 
the run-time for general bipartite graphs is based on the observation that once y/n augmentations have been 
performed, the constructed matching necessarily has size at most ^/n smaller than the maximum matching. 



We denote the graph obtained by sampling edges of G independently with probability p = ^^^^ for a 
constant c > by G*. Note that G* is obtained from G by uniform sampling. We will make the connection to 



non-uniform sampling in Theorem 5.10 For A C V{G) denote the set of edges in the cut {A, V{G) \A)'mG 
by 5{A) and the set of edges in the same cut in G* by 5* {A). Similarly, we denote the vertex neighbourhood 
of ^ in G by T{A) and the vertex neighbourhood in G* by T*{A). We consider the alternating level graph 
in G* and prove that whp for any partial matching of size smaller than n — 2n/d for each I < j < L either 
\Bj-i U Bj U -Bj+i I = 17(d) or Bj expands by at least a factor of In n in either forward or backward direction 
i\Bj+i\ > {In n)\Bj\ or \Bj^i\ > {In n)\Bj\). This implies that L = O {j^^^), thus yielding the same 
bound on the length of the shortest augmenting path by virtue of corollary |5.6[ The main technical result of 
this subsection is 

Lemma 5.8 Let the graph G* be obtained from the bipartite d-regular graph G on 2n vertices by uniform 
sampling with probability p. There exist constants c > 0, e > such that if p > ^^j", then whp for any 
partial matching in G* of size smaller than n — 2n/d there exists an augmenting path of length O (^Tj^nn)- 



The following expansion property of the graph G* will be used to prove lemma 5.8 
Lemma 5.9 Define j{t) = (1 — exp(— i))/t. For all t > there exists a constant c > that depends on t 

Ini 

d 



and e such that if G* is obtained by sampling the edges of G independently with probability p > then 



whp for every set A C P, \A\ < t/p(resp. B <ZQ, \B\ < t/p) 

\T*{A)\>{l-e)dp^{t)\A\. 

Proof: Consider a set A C P, \A\ < t/p. For b G '^^{A) denote the indicator variable corresponding to the 
event that at least one edge incident on b and going to A is sampled by Xf,, i.e. Xb = l!^ber*{A)}- Denote the 
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number of edges between b and vertices of A by /cf,. We have 



Pr[X5 = 1] = 1 - (1 - p)''^ > 1 - exp(-M > hp^it), 



since k^p < t and e ^ < 1 — 7(t)a; for x E [0, t]. 
Hence, 



E 



Y^Xf, >pYh>p\6iA)\^{t). (9) 
.fees J beB 

There are at most n'^ subsets A of P of size s and = d\A\ for all A, so we obtain using Chemoff 

bounds and the union bound 

n 

Fr [3 AQP: \T*{A)\ < {1 - e)pd\A\-f{t)] < ^ exp (-eV^TW) 

s=l 

n 

= ^exp(s(l -C7(t))lnn) = ©(n^^^^W), 

which can be made 0{n~^) by choosing c > (2 + r)/j{t) for any r > 0. ■ 
Proof of Lemma [5^ First note that since the partial matching is of size strictly less than n — 2n/d,hy 
Corollary |5.6| there exists an augmenting path with respect to the partial matching. 

In order to upperbound the length of the shortest augmenting path, we will show that for each j, at least 
one of the following is true: 

1. \Bj\ > d/500; 

2. \Bj+i\ > d/500; 

3. \Bj+i\ > {In n)\Bj\; 

4. \Bj^i\ > d/500; 

5. \Bj-i\ > {lnn)\Bj\. 



It then follows that for each j there exists j' such that \j 



< 1 + lnin„(i and \By\ > d/500. 



Hence, there cannot be more than O ( dininn ) l^^^l^ alternating level graph, so there always exists an 

augmenting path of length O {jj^). 

For each I < j < L, where L is the number of levels in the alternating level graph, we classify the edges 
leaving Bj into three classes: (1) Ep contains edges that go to [7 \ A^^\ (2) Em contains edges that go to Aj, 
and (3) Er contains edges that go to Aj^i. At least one of Ep, Em, Er has at least (1 — €)pd\Bj\/?, edges 



by Lemma 5.9 We now consider each of these possibilities. 

Case (A): First suppose that Ep contains at least (1 — e)pd\Bj\/'i edges. Note that since the partial 
matching has size smaller than n — 2n/d by assumption, we have that |^^-''^| > \B^^^\ + 2n/d. Hence, by 
Corollary 5.6 the number of edges going from Aj to -Bj+i is at least 



(l-36)(l-e) 
1 + e 



pd\Aj\/2,. 



Suppose first that \Aj\ < l/(5p). Then by Lemma 5.9 one has that |r*(Aj)| > (1 - e)'~i{l/f))pd\Aj\. Let 
= 1 + e — ^^^■^1^^^^'^'' ■ Observe that since one edge going out of A-, yields at most one neighbor, at most 

p(i|y4j|/3 = l5*pd\Aj\ neighbours of vertices of Aj are outside -Bj+i. Setting 



(1 + e)pd\Aj 



(l-3e)(l-e) „ 
1+e 
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e = 1/15, we get that Bj+i contains at least ((1 - e)7(l/5) - (3*)pd\Aj\ > OMlpd\Aj\ > {lnn)\Aj\ 
neighbours of \Aj\, i.e. > (lnn)|Aj| = (lnn)|i3j| (this corresponds to case 3 above). Now if 

l^jl > l/(5p), one can find A' C Aj such that \A'\ = ll/{5p)\ and at least ^^^^^$}^pd\A'\/3 edges 
going out of A' go to -Bj+i, which implies by the same argument that |-Bj+i| > 0.011pd|74'| > d/500 (this 
corresponds to case 2 above). 

Case (B): Suppose that Em contains at least (1 — e)pd\Bj\/3 edges. Then by the same argument as in the 
previous paragraph (after first weakening our estimate to pd\ Bj | /3) we have that \Aj\ > (In n) \ Bj \ 

if \Bj\ < l/(5p). This is impossible when Inn > 1 since \Aj\ = \Bj\. Hence, \Aj\ > d/500 by same 
argument as above, and hence \Bj\ > d/500 (this corresponds to case 1 above). 

Case (C): Suppose that Eji contains at least (1 — e)pd\Bj\/3 edges. By the same argument as above we 
have that either I I > (i/500 (this corresponds to case 5 above) or > (ln?7,)|i?j| (this corresponds to 
case 4 above). 

This completes the proof. ■ 
We can now prove the main result of this section: 

Theorem 5.10 Let the graph G* be obtained from G using steps SI and S2 in the algorithm of section^ 
Then step S3 takes O ^ ^ 1^"^^ " ^ time whp, giving a time of O ^ " ^ for the entire algorithm. 

Proof: We analyze the runtime of step S3 in two stages: (1) finding a matching of size n — 2n/d, and (2) 
extending the matching of size n — 2n/(i to a perfect matching. 

Note that the strength of edges in G' obtained after SI does not exceed ^^"'"'^^^2^^'^" . the maximum degree, 
with high probability, for a constant c > 0. Hence, the combination of sampling uniformly in SI and non- 
uniformly in S2 dominates sampling each edge with probability i7 (^),so we write G" = {P,Q,E*UE**), 
where E* is obtained from E by sampling uniformly with probability p = for a sufficiently large c > 0. 
The constant c > can be made sufficiently large so that Lemma [5^ applies by adjusting the constant in the 
sampling in steps SI and S2. Denote G* = {P, Q, E*) and note that the proof of Lemma [5^ only uses lower 
bounds on the number of edges incident to vertices in a given set, as well as the number of vertex neighbours 
of a set of vertices. Hence, since all bounds apply to G* , the conclusion of the lemma is valid for G" whp as 
well, and we conclude that the maximum number of layers in an alternating level graph, and hence the length 
of the shortest augmenting path, is O ( Jlninn ) • QSiCh augmentation phase takes time proportional to the 

(2 1 2 

Finally, note that each augmentation phase increases the size of the matching by at least 1, and thus 
0{n/d) augmentation suffice to extend the matching constructed in the first stage to a perfect matching. This 

takes O f^^^^) time, so the runtime is O (g^^n^) for step S3, and O (^^^^) overall. ■ 



Remark5.il Theorem 5.10 as well as lemma 5.8 can be slightly altered to show that the runtime of the 



Hopcroft-Karp algorithm on the subsampled graph from [8 J is O y^^i^j- This shows that the approach 
in yields an 0{rJ'^^) algorithm, which is better than 0(n^'^^) stated in 



Theorem 5.12 For any function d{n) > 2-^/n there exists an infinite family of d(n)-regular graphs with 
2n + o(n) vertices such that whp the algorithm in section^petforms Q.{n/d) augmentations in the worst 
case. 

Proof: In what follows we omit the dependence of d on n for brevity. Define H^^'> = {U, V, E),0 < k < d,to 
be a (d — A;)-regular bipartite graph with \U\ = \V\ = d. The graph G consists of t copies of H^^\ which we 
denote by {HjY-^-^, where Hj = H^^^^^'^\ and 2t vertices ui, . . . ,ut and vi, . . . ,vt. Each of ni, . . . , -u^ is 
connected to all d vertices in the F-part of Hi, and for I < j <t, the vertex vj is connected to all vertices in 
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the [/-part of Hj. The remaining connections are established by adding t — j edge-disjoint perfect matchings 
between the U part of Hj and the V part of -f/j+i for all I < j < t. 

Set t = n/d < < d/4. Note that the strength of edges in Hj is at least d/4, so whp there 

exists a perfect matching in subgraph of Hj generated by the sampling steps SI and S2, for 1 < j < t. 
Suppose that at the first iteration of the Hopcroft-Karp algorithm a perfect matching is found in each Hj , thus 
leaving unmatched the vertices ui, . . . ,ut and vi, . . . ,vt. Then from this point on, the shortest augmenting 
path for each pair pair {uj,Vj) has length j, and each augmentation phase of the Hopcroft-Karp algorithm 
will increase the size of the matching by 1. Hence, it takes t augmentations to find a perfect matching. The 
number of vertices is 2{d + l)t = 2n + o(n). ■ 



6 Perfect Matchings in Doubly Stochastic Matrices 

An n X n matrix A is said to be doubly stochastic if every element is non-negative, and every row-sum and 
every column-sum is 1. The celebrated Birkhoff-von Neumann theorem says that every doubly stochastic 
matrix is a convex combination of permutation matrices (i.e., matchings). Surprisingly, the running time of 
computing this convex combination (known as a Birkhoff-von Neumann decomposition) is typically reported 
as 0{'m?^/n), even though much better algorithms can be easily obtained using existing techniques or very 
simple modifications. We list these running times here since there does not seem to be any published recorcj^ 
After listing the running times that can be obtained using existing techniques, we will show how proportionate 
uncrossings can be applied to this problem to obtain a slight improvement. 

1. An O(m^) -time algorithm for finding a Birkhoff-von Neumann decomposition can be obtained by 
finding a perfect matching in the existing graph using augmenting paths (in time 0{mn)), assigning 
this matching a weight which is the weight of the smallest edge in the matching, subtracting this weight 
from every edge in the matching (causing one or more edges to be removed from the support of A), and 
continuing the augmenting path algorithm without restarting. When a matching is found, if we remove 
k edges then we need to find only k augmenting paths (finding each augmenting path takes time 0{m)) 
to find another matching, which leads to a total time of 0{m?'). 

2. Let h be the maximum number of significant bits in any entry of A. An 0(m6)-time algorithm for 
finding a single perfect matching in the support of a doubly stochastic matrix can be easily obtained 
using the technique of Gabow and Kariv [7 |: repeatedly find Euler tours in edges where the lowest 
order bit (say bit j) is 1, and then increase the weight of all edges going from left to right by 2^^ and 
decrease the weight of all edges going from right to left by the same amount, where the directionality 
of edges corresponds to an arbitrary orientation of the Euler tour; this eliminates bit j while preserving 
the doubly stochastic property and without increasing the support. 

3. An 0(mn6)-time algorithm to compute the Birkhoff-von Neumann decomposition can be obtained 
using the edge coloring algorithm of Gabow and Kariv L2J ■ 

We now show how our techniques lead to an 0{m\T? n + n^'^ lnn)-time algorithm for finding a single 
perfect matching in the support of a doubly stochastic matrix. In realistic scenarios, this is unlikely to be better 
than (2) above, and we present this primarily to illustrate another application of our proportionate uncrossing 
technique. First, define a weighted bipartite graph G = (P, Q, E), where P = {ui,U2, ■ ■ ■ n„} corresponds 
to rows of A, Q = {^1,^2, • • • ,Vn] corresponds to columns of A, and {ui,Vj) G E iff Aij > 0. Define 
a weight function w on edges, with w{ui,Vj) = Aij. Let TZ be the collection of all pairs {A^B),A C 
P, B Q ,\P\ > \Q\. Since A is doubly stochastic, the collection TZ is (1 /2)-thick with respect to (G, w, E). 

^This list was compiled by Bhattacharjee and Goel and is presented here to provide some context rather than as original work. 
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Let T be a (1/2) -uncrossing of TZ. Performing a Benczur-Karger sampling on G will guarantee (with high 
probability) that at least one edge is sampled from every witness set in W{T), and hence running the Hopcroft- 
Karp algorithm on the sampled graph will yield a perfect matching with high probability. The running time 
of 0(mln'^n + n^'^lnn) is just the sum of the running times of Benczur-Karger sampling for weighted 
graphs im and the Hopcroft-Karp matching algorithm ||9l. 
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A Proof of Lemma 



2.3 



Consider any {A, B) where |^| > \B\, A (1 P,B (1 Q. Define Ai = Pif] A and Bi = QiH B. Fix an 
i such that | | > \Bi\; such an i is guaranteed to exist. By the definition of relevance, there exists a pair 
{X, Y) £TZ such that X O Ai, and W{X, Y) n Er C W{Ai, Bi) n Er. By the assumption in the theorem, 
there exists an edge (u, v) e E* HErH W{X, Y). Since W{X, y) n Sj? C W{Ai, Bi) n Er, it follows that 
{u,v) £ E*nERn W{Ai, Bi). This edge is in G* , and goes from Ai to Q, \ Sj, /.e. , from Ai to Qi \ (Qj n , 
and hence, from AtoQ\B. Since the only assumption on (^4, B) was that |^| > \B\, we can now invoke 
Hall's theorem to claim that G* has a perfect matching. ■ 



B Proof of Theorem 2.6 



As mentioned before, the proof is along very similar lines to that of the Benczur-Karger sampling theorem, 
but does not follow in a black-box fashion and is presented here for completeness. The proof relies on the 
following result due to Karger and Stein |[T2|: 



Lemma B.l Let H(y, E) be an undirected graph on n vertices such that each edge e has an associated 
non-negative weight p^. Let s* be the value of minimum cut in H under the weight function pg. Then for any 
a > I, the number of cuts in H of weight at most as* is less than n^°. 

Proof of Theorem |2.6| We will choose c = 5. The first part of the proof shows that it is sufficient to bound a 
certain expression that involves only cuts. The second part then bounds this expression. 

For the first part, let //(X) = Yleex denote the expected number of edges chosen from X by the 
sampling process. If a set X G A" contains an edge e with p^ = I, then that edge will definitely be chosen, 
and that set does not contribute to 



Pr[No edge in X is chosen in H ] 



and can be removed from X. Hence, assume without loss of generality thatpe < 1 for every edge in UxeA" X- 
Define jl{X) = J^e^x (^) ■ Now for any set X £ X, 



Pr[No edge in X is chosen in H'] = H ~ - 11 ^"^^ - 

where 

Since / is a one-one function, it is sufficient to provide an upper-bound on ^cec e"^^*^^- 

For the second part, let fli, fl2, ■ ■ ■ , ^2" -2 be a non-decreasing sorted sequence corresponding to the multi- 
set {jj-{C) : G G C}. Define qi = e~^\ Consider an arbitrary cut C. Any edge in G can have strength at most 
\G\, and hence fl{G) > clnn, and therefore, qi < n^'^. So the sum of qi for the first n^ cuts in the sequence 



is bounded by n We now focus on the remaining cuts. By Lemma B.l we know that for any a > 1, we 
have /i„2Q > afLi. Hence 

In k 



2 inn 



which in turn implies that qt < k Thus 
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Pr[No edge in X is chosen in H'] < ^ e^^^^^ < ^ X] - k'^^^ = Oin'"^^), 

XeX CeC k=l k>n^ kyn^ 

giving us the desired result when we choose c = 5. ■ 



C Proof of Lemma 2.8 



Assume by way of contradiction that no such integer j exists for some pair of multisets and 52- Let K be 
the largest integer in Si\J S2, and let a, and /3j denote the number of occurrences of i in the multisets Si and 
52 respectively. Then for all j > 1, we have 

K ( K r, 

i=j \ i=j 

Summing the above inequality for all j G {1..K}, we get 



i=l \i=l / 



which is a contradiction since ISil > 7|52| by assumption. 
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